In silico identification and functional prediction of differentially expressed genes in South Asian populations associated with type 2 diabetes

Type 2 diabetes (T2D) is one of the major metabolic disorders in humans caused by hyperglycemia and insulin resistance syndrome. Although significant genetic effects on T2D pathogenesis are experimentally proved, the molecular mechanism of T2D in South Asian Populations (SAPs) is still limited. Hence, the current research analyzed two Gene Expression Omnibus (GEO) and 17 Genome-Wide Association Studies (GWAS) datasets associated with T2D in SAP to identify DEGs (differentially expressed genes). The identified DEGs were further analyzed to explore the molecular mechanism of T2D pathogenesis following a series of bioinformatics approaches. Following PPI (Protein-Protein Interaction), 867 potential DEGs and nine hub genes were identified that might play significant roles in T2D pathogenesis. Interestingly, CTNNB1 and RUNX2 hub genes were found to be unique for T2D pathogenesis in SAPs. Then, the GO (Gene Ontology) showed the potential biological, molecular, and cellular functions of the DEGs. The target genes also interacted with different pathways of T2D pathogenesis. In fact, 118 genes (including HNF1A and TCF7L2 hub genes) were directly associated with T2D pathogenesis. Indeed, eight key miRNAs among 2582 significantly interacted with the target genes. Even 64 genes were downregulated by 367 FDA-approved drugs. Interestingly, 11 genes showed a wide range (9–43) of drug specificity. Hence, the identified DEGs may guide to elucidate the molecular mechanism of T2D pathogenesis in SAPs. Therefore, integrating the research findings of the potential roles of DEGs and candidate drug-mediated downregulation of marker genes, future drugs or treatments could be developed to treat T2D in SAPs.


Introduction
Type 2 diabetes (T2D) is a complex metabolic disorder that has sparked much research interest globally.In 2021, 536.6 million people had diabetes, and 783.2 million are expected to have T2D by 2045 [1].Indeed, diabetes cases are also expected to rise among Southeast Asian populations over the next two decades, from 90.2 million in 2021 to 151.5 million people by 2045 [1].In 2019, 1.5 million people worldwide died only due to diabetes [2].In South Asia, Bangladeshi, Pakistani, Nepali, Bhutanese, Maldivian, Sri Lankan, and Singaporean populations showed a significant increase in T2D patients during the last three decades [3].However, Indian populations showed the highest prevalence of diabetes, followed by the Chinese population.In 2025, a case of 69.9 million diabetes is anticipated in India, where the majority are still undiagnosed clinically [4].Certainly, more than 6.3 million Pakistani populations are suffering from diabetes [5].In the Singaporean population, one-third are at risk of diabetes during the whole life span, and the number is anticipated to be increased by more than 1 million by 2050 [6].Similarly, during the last three decades, the prevalence of T2D has significantly increased in Nepali, Sri Lankan, Bhutanese, and Maldivian populations.Not surprisingly, T2D is increasing alarmingly among other SAPs, especially in Bangladesh.Based on age, sex, and disease complexities, the overall prevalence of T2D in the Bangladeshi population ranges from 2.21 to 35% [7], which is very closed to India and China.In 2021, 13.1 million Bangladeshi adults had diabetes, and the cases are anticipated to be 22.3 million by 2045 [1].
T2D is the cumulative effect of genetic and environmental factors.Multiple susceptible genetic signatures responsible for T2D have already been identified in various South Asian countries using genome-wide association studies (GWAS) [8].In South Asian population (SAP), diverse genetic variation, population structure, and disease associations led to inconsistent population-specific medical treatment.Hence, the construction of reference genome databases for specific populations and GWAS among various populations are urgently needed [9].
Due to considerable intergroup cultural differences, SAP possesses a significant genetic diversity [10].Research on Asian human pathogenomics will guide physicians to suggest precise medication for the respective population [11,12].Among SAPs, significant genetic diversity was observed among different ethnicities in the Singaporean population (Chinese, Malay, and Indian).Additionally, 14 potential loci were identified in various Asian and Oceanian populations with solid relationships with complex traits and disorders [11].Consequently, research on human pathogenomics consequences the availability of data on human genetics over a wide range of geographical distribution.[11,13].
Although different genes and loci have already been identified for T2D pathogenesis among SAP, how these genes interact at transcript and protein levels in disease association, different metabolic pathways, biological systems, and drug interaction are still unknown that needs to be elucidated.In addition, extensive genetic diversity among different SAP guides further research on population-specific disease associations [8].Bioinformatic research facilitates analyzing variations in gene expression at the transcript level, which guides to identify of differentially expressed genes (DEGs) and their functional role in T2D pathogenesis [16].Thus, the identification and functional prediction of DEGs, following their role in different metabolic pathways associated with type T2D pathogenesis would be a significant advancement of T2D research in SAP.
Therefore, we have designed the research to identify and predict the functions of DEGs associated with T2D pathogenesis in SAP using GWAS catalog data and gene expression omnibus (GEO) data.These screened DEGs were utilized for further analyses following protein-protein interaction, gene ontology, pathway enrichment, miRNA target regulatory, disease association, and drug-gene interaction to elucidate the mechanisms of T2D pathogenesis among SAPs.

Microarray data
The GEO is a publicly available functional genomic resource that comprises information from chips, microarrays, and high-throughput gene expression investigations.Two microarray datasets (GSE26168 and GSE78721) were obtained from the NCBI database called the GEO database (https://www.ncbi.nlm.nih.gov/gds), and each GSE file was separated into control and disease states.Both datasets were from the South Asian populations (Singaporean and Indian).The dataset of the Singaporean population (GSE26168) is based on the GPL6883 platform, and we have used eight controls and nine T2D-affected samples within 60 samples of the dataset.While the dataset of the Indian population (GSE78721) is based on the GPL15207 platform, and we have used 16 controls and 19 T2D-affected samples within 130 samples of the dataset.The differentially expressed genes (DEGs) in T2D-affected populations were identified using the GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r)database.GEO2R is a web-based application that analyzes two or more GEO datasets to elucidate the DEGs under various experimental conditions.The modified Benjamin and Hochberg's false discovery rate and the P-values were utilized to balance the identification of statistically significant DEGs with limitations of false positives.DEGs in T2D were identified using the fold change value, |log FC| > 1.5 and adj.P 0.05 [17][18][19][20][21].

Protein-protein interaction (PPI) network analysis and identification of Hub genes
The PPI of the translated proteins of identified DEGs was constructed using the publicly available STRING database (https://string-db.org/).After inputting the ID of the identified DEGs onto the STRING database, the species "Homo sapiens" was selected, and the high confidence (0.900) interaction score was set to create the PPI interaction network.Subsequently, Cytoscape software was used to visualize the PPI networks [23][24][25].Then, we have used MCODE (Molecular Complex Detection) (http://apps.cytoscape.org/apps/mcode) and Cytohubba on Cytoscape (http://apps.cytoscape.org/apps/cytohubba)plugins to determine the most critical subnetwork modules.Cytohubba follows topological algorithms to visualize protein associations.MCODE gives clusters of sub-networks.We have set Node Score Cutoff = 0.2, Degree Cutoff = 2, and K-Score = 2 during analysis in MCODE [26].

Gene ontology (GO) enrichment analysis
The GO analysis was done using the ToppFun tool of ToppGene (https://toppgene.cchmc.org/enrichment.jsp) to conduct the functional enrichment of DEGs.We have used the default settings of the ToppGene suite portal with the p-value of 0.05 with corrected an FDR value.In the default setting, Correction value = FDR, p-Value cutoff score = 0.05, Gene Limits 1< = n< = 2000 were maintained.Then, the top ten significant roles in cellular components, biological processes, and molecular functions were presented [27].

Pathway enrichment analysis
The publicly available Web-Gestalt (WEB-based Gene SeT AnaLysis Toolkit) (http://www.webgestalt.org/)database was used to analyze the KEGG pathways enrichment, setting the FDR (false discovery rate) value of 0.05 as the cutoff value [28].The pathway enrichment analysis was done following the Over Representation Analysis (ORA) method, which is one of the three WebGestalt software methods.Homo sapiens was the reference genome during the KEGG pathway enrichment analysis, where Gene Symbol ID was used as the gene ID [29].

Construction of DEGs-miRNA regulatory network
The publicly available miRNet (https://www.mirnet.ca/)database was used to predict the regulatory network of the identified DEGs-miRNAs associated with T2D.In the database, the genes option was selected and gene IDs were inputted to identify DEGs-miRNA association for T2D following the default setting of Homo sapiens species.The regulatory network of DEGs-miRNA was developed and visualized using Cytoscape.The top 30 degrees of the node was chosen to visualize the DEGs-miRNAs regulatory network [30].

Disease association analysis
The Web-Gestalt database (http://www.webgestalt.org/)was used to analyze disease associations of the DEGs following the default setting of the Homo sapiens genome.Web-Gestalt database uses Benjamini and Hochberg approach and a hypergeometric statistical test to determining the false discovery rate.The top ten most significant disease associations were presented [31].

Construction of candidate drug-gene interaction network
The interaction networks of candidate drug-gene were predicted using the publicly available DGIdb database (Drug Gene Interaction Database) (DGIdb, v3.0.2, https://www.dgidb.org/).The candidate drug-gene interaction pairs were obtained using known and FDA-approved drugs.Following that, Cytoscape software was used to analyze and visualize the drug-gene interaction network [32].

Identification of differentially expressed genes (DEGs) from the microarray GEO dataset
The GEO database was used to identify DEGs associated with T2D.Two South Asian GEO datasets (GSE26168 and GSE78721) were analyzed.Here, we have found 221 DEGs in the GSE26168 (GPL6883) dataset (The CDC42 gene showed both up and down-regulated), including 95 upregulated genes and 127 downregulated genes.In the GSE78721 (GPL15207) dataset, 28 DEGs were found, where 26 genes were upregulated and two were downregulated.

Gene ontology enrichment analysis
Gene ontology (GO) is a framework to classify how gene plays a role in molecular functions, biological processes, and cellular components.GO analysis of all 867 potential genes was performed using the ToppGene database.The top 10 significant molecular activities, biological processes, and cellular components were selected (Fig 2).Changes in biological processes were significantly enriched due to positive regulation of RNA metabolic processes, macromolecule biosynthetic processes, transcription, nucleic acid-templated transcription, DNA-and RNAtemplated biosynthetic processes, cellular secretion, peptide hormone secretion, regulation of cell differentiation, and homeostasis of the cell (S3 Table ).The cellular component analysis of DEGs revealed that these genes play a significant role in the vesicle lumen, granule lumen, transcription regulator complex, secretory granule lumen, sarcolemma, cytoplasmic vesicle lumen, secretory vesicle, chromatin, secretory granule, and synapse (S3 Table ).Furthermore, based on the molecular functional analysis, the candidate genes significantly contributed to transcription factor binding, peptide hormone binding, DNA-binding transcription factor binding, kinase binding, DNA-binding transcription activator activity, kinase activity, protein kinase activity, protein kinase binding, DNA-binding transcription activator activity, protein homodimerization activity, and RNA polymerase II-specific activity (S3 Table ).

KEGG pathway enrichment analysis
The analysis of KEGG pathway enrichment based on the highest enrichment ratio and an FDR of < 0.05 revealed that DEGs regulate several metabolic pathways (S1 Fig) .In the analysis, the most enriched pathways were maturity-onset diabetes of the young (MODY), EGFR tyrosine kinase inhibitor resistance, pancreatic cancer, insulin secretion, small cell lung cancer, prostate cancer, transcriptional misregulation in cancer, HIF-1 (Hypoxia-inducible factor 1) signaling pathway, the PI3K-Akt (Phosphoinositide 3-kinases-protein kinase B) signaling pathway, and human papillomavirus infection (Table 1).

Disease association analysis
The disease association analysis was performed to identify the diseases associated with identified DEGs.The results revealed that gestational diabetes, T2D, obesity, hyperglycemia, endocrine system diseases, endocrine disorder NOS (Not Otherwise Specified), endocrine disturbance NOS, and nutritional and metabolic diseases are associated with identified DEGs (Fig 4).Among 867 genes, only 118 genes were significantly associated with T2D pathogenesis (Table 2).Two (HNF1A and TCF7L2), of 118 genes were identified in the hub genes network (Fig 2B ), indicating these two genes have a significant role in T2D pathogenesis and its associated disorders.

Construction of the drug-gene network
Research on drug-gene interaction networks is crucial for drug discovery and development.

Discussion
Comprehensive analysis of the microarray dataset guides expression patterns of DEGs and their integrative biological functions under different conditions in living organisms [31].To identify and characterize DEGs, the raw data should be analyzed after omitting poor-quality measurements, simplifying comparisons, fixing measured intensities, and proper screening [33][34][35].Following that, normalization is usually done to identify significant biological associations of the expression data.Analysis of gene expression under different conditions guides how the gene plays a role in different biological functions [36].
Here, we have screened 1085 DEGs from microarray (GEO) and GWAS catalog datasets after removing overlapped genes.Microarray dataset analysis revealed that 249 genes were associated with T2D pathogenesis, where 121 were upregulated and 128 downregulated (S1 Table ).More specifically, PTGS2 and IL1B genes were upregulated in children with diabetes [37].A number of 29 genes were upregulated, and two were downregulated in patients having acute hyperinsulinemia in skeletal muscle [38].In addition, 109 upregulated and six downregulated genes were observed in T2D [22].Interestingly, 301 upregulated and 680 downregulated genes between T2D and the control population were also observed [39].More specifically, the ABRA gene is upregulated when skeletal muscle is insulin-resistant [40].Here, we have also identified 843 unique genes from 17 GWAS catalog datasets that are associated with T2D pathogenesis in South Asian populations (S2 Table ).Accordingly, 233 unique genes from the GWAS catalog data were found to be associated with T2D [22].Hence, DEGs play critical roles in T2D pathogenesis, which may be pivotal in treating diabetes by investigating their regulatory functions [39,41].This current study is the first comprehensive analysis of the GEO and GWAS catalog datasets among the SAPs and identifies potential DEGs related to T2D.
In the PPI network, we have identified 867 candidate genes that are significantly associated with T2D pathogenesis.In addition, we have found nine hub genes among the 867 candidate genes for T2D pathogenesis.Among the nine hub genes, two genes (CTNNB1, RUNX2) were only found in the South Asian population.However, the remaining seven hub genes (HNF1A, PSMC2, PSMA3, TCF7L2, TLE1, PSMD6, CTBP1) were observed in the South Asian population in addition to other major populations (American, African, East Asian, and European).
The results of GO analysis demonstrated that most biological processes were linked to regulating RNA and DNA-based metabolic and biosynthetic processes.Furthermore, most cellular components were found to be linked with granule and vesicle lumen activities.The result is consistent with our previous research except for differences in population group [47].By binding with homologous DNA and RNA, lncRNAs control gene expression and are linked to various human diseases, including diabetes [48].Hence, the crucial roles of the analyzed DEGs in different biological systems guide how DEGs play a role in T2D pathogenesis modulating cellular, molecular and biological processes in SAP.
DEGs are observed in the secretory-granule lumen, vesicle lumen, and platelet alpha granule lumen tissues in T2D [49].Insulin is secreted in the vesicles called insulin secretory granules (ISGs) that are affected by T2D, consequences of dysfunctional ISG production, and restricted insulin secretion [50].In our analysis, most molecular functions of the identified DEGs were linked to transcription factors and kinase-binding activities (S3 Table ).
MODY is a genetic heritable diabetes characterized by beta-cell dysfunction, non-insulindependent diabetes (NIDD), and autosomal dominant inheritance at a young age [51].It is also known as non-ketotic diabetes, caused due to malfunctioning of pancreatic beta-cell, and lack of pancreatic autoantibodies [52].The epidermal growth factor receptor (EGFR), a tyrosine kinase receptor having a transmembrane domain, is the critical component of cell signaling pathways.The EGFR receptor plays a vital role in the MAPK (Mitogen-activated protein kinase) pathway, the PI3K/AKT (Phosphatidylinositol-3-kinase/Protein kinase B) pathway, and the JAK (Janus kinase) pathway, which stimulates cell proliferation, mitosis, and inhibition of apoptosis [53].The heterodimeric transcription factor HIF-1 is the critical mediator that controls the expression of numerous genes involved in cell cycle regulation, cellular metabolism, angiogenesis, and block of apoptosis [54].In addition, the PI3K/AKT pathway regulates many cellular functions, like cell survival, cancer progression, proliferation, neuroscience, and metabolism [55].In the study, the pathway enrichment analysis showed that the identified DEGs associated with T2D significantly interacted with the MODY, EGFR, PI3K-AKT, and HIF-1 signaling pathways and, therefore, might play a significant regulatory network in the progression of T2D pathogenesis (Table 1 and S1 Fig).
miRNA plays a regulatory role in disease progression through epigenetic modification, histone modification, and DNA methylation.miRNAs are also associated with the diagnosis and response for the treatment of diseases [56].miRNAs are commonly found in all human/mammal cells that are involved in cell development [57] by regulating around 30 percent of the genes that code proteins [58].miRNAs critically regulate post-transcriptional gene expression [59].miRNAs are also involved in glucose homeostasis and regulate the expression of genes involved in diabetes-relevant pathways like the insulin signaling pathway [60][61][62].Some miR-NAs also control the secretion and synthesis of insulin to balance blood glucose levels in human [63].miR-362-3p, miR-15a-5p, miR-150-5p, and miR-877-3p showed significant contributions in T2D pathogenesis [64].Transcripts of hsa-miR-16-5p, hsa-miR-17-5p, hsa-miR-19a-3p, and hsa-miR-20a-5p miRNAs were upregulated in patients having insulin resistance and abnormal pregnancies.Gene-miRNAs regulatory network analysis revealed that these miRNAs significantly regulate MAPK signaling, insulin signaling, TGF-β signaling, and mTOR signaling pathways consequences the progression of T2D [65].Increased expression of has-miR-24-3p plays a crucial role in the pathophysiology and progresses of proliferative diabetic retinopathy [66].Increased expression of has-miR-15a-5p stimulates β cells and promotes insulin production [62].Overexpression of has-miR-27a-3p in L6 cells decreased glucose consumption and glucose uptake and reduced the expression of GLUT4, MAPK 14, and PI3K regulatory subunit [67].miRNA hsa-let-7a-5p was shown to be significantly associated with diabetic retinopathy (DR) in T2D.Overexpression of hsa-let-7a-5p resulted in rapid pathogenesis of DR [68].In our study, we have identified hsa-mir-16-5p, hsa-mir-17-5p, hsamir-24-3p, hsa-mir-27a-3p, hsa-let-7a-5p, hsa-mir-19a-3p, hsa-mir-15a-5p, and hsa-mir-20a-5p that are significantly interacted with the identified DEGs associated with T2D (S4 Table ).Therefore, the interacted miRNAs may cause progression of T2D pathogenies regulating MAPK signaling, insulin signaling, TGF-β signaling, mTOR signaling pathways, and diabetic retinopathy signaling pathways.Furthermore, 367 FDA-approved drugs for T2D significantly downregulated the candidate genes in drug-gene association analysis (S5 Table ).D-phenylalanine derivative nateglinide is an amino acid that stimulates insulin secretion by regulating pancreatic β-cells.It also controls hyperglycemia and improves glycemic control in T2D [69].The another drug carvedilol helps to improve endothelial functions in T2D [70].Interestingly, the T2D drug diclofenac sodium plays a significant role in glycemic control [71], and dipyridamole significantly reduces proteinuria in T2D nephropathy [72].In addition, repaglinide is an efficient anti-diabetic drug [73].In our study, nateglinide significantly downregulated the expression of ABCC8, and KCNJ11 genes (S5 Table ).In addition, repaglinide significantly downregulated the expression of ABCC8, and KCNJ11 genes, carvedilol downregulated ADRB1, and ADRB3 genes (S5 Table ).Indeed, diclofenac and dipyridamole significantly downregulated expression of PPARG, and PDE3A genes respectively (S5 Table ).The case may same for other drugs.Therefore, in addition to above mentioned drugs, other drugs found in the gene-drug interaction network could be used to downregulate the expression of candidate DEGs for treating T2D.Since different drugs interacted with specific genes, the precise drug might be recommended to specific T2D patients after observing genetic mutations, and expression levels of candidate genes for curing T2D, even controlling of progression of prediabetes to T2D [74].

Conclusions
Since genetic signatures play vital roles in T2D pathogenesis, a series of bioinformatic systems were applied to analyze 2 GEO and 17 GWAS catalogue datasets from SAPs to explore the DEGs associated with the diseases.Following critical PPI analysis, 867 DEGs were found to be associated with T2D pathogenesis.Indeed, nine hub genes were also identified for the pathogenesis.Among these, CTNNB1, and RUNX2 could be the markers for T2D pathogenesis in SAPs as only found in that populations.In GO analysis, most of the identified DEGs showed significant contributions in molecular activities, biological processes, and cellular components of T2D.Following KEGG pathway analysis, MODY, EGFR tyrosine kinase inhibitor resistance, insulin secretion, HIF-1, and PI3K-Akt signaling pathways were found to be significantly enriched by the DEGs.Two genes (HNF1A and TCF7L2) among 118 identified genes that significantly contributed to T2D, were found in both hub genes and disease association.Even, 825 DEGs were also interrelated with 2582 miRNAs.Not surprisingly, several miRNAs regulate the expression of a single gene or vice versa (ex, hsa-mir-15a-5p, hsa-mir-16-5p, hsamir-24-3p).Among the 64 genes that interacted with 367 drugs of T2D, ABCC8, ACE, ACHE, ADRB1, ADRB3, BRAF, HTT, INSR, KCNJ11, PDE3A, and PPARG genes were downregulated by wide range of (9-43) FDA approved drugs for T2D.Indeed, different FDA-approved drugs significantly downregulated the expression of target genes.Therefore, the findings of the research might guide to explore the mechanism of how the DEGs progress T2D pathogenesis by interacting different biological functions, pathways, and miRNAs.Considering the abovementioned findings, precise medication could be recommended after diagnosing the molecular mechanism of T2D pathogenesis and observing the expression levels of marker genes in T2D patients among SAPs.

Fig 1 .
Fig 1.Protein-protein interaction network (a) obtained from the STRING database for (867) genes (interaction score > 0.9).The identified hub genes and their interactions (b).Circles represent genes, and lines represent interactions among proteins of differentially expressed genes.

Fig 2 .
Fig 2. The gene ontology enrichment analysis of differentially expressed genes.The orange color represents the biological processes; the green color represents the cellular components and the blue color represents the molecular functions of candidate genes.https://doi.org/10.1371/journal.pone.0294399.g002

Fig 5 .
Fig 5. Drug-gene interaction network.Red dot indicates the genes and blue dot indicates drugs known to inhibit the expression of target genes.https://doi.org/10.1371/journal.pone.0294399.g005